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A proposal: a comprehensive platform to characterize tumors in 
Chinese and improve success in cancer drug discovery 
and development 
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Abstract 

Cancer is a collection of complex diseases in which cell proliferation and apoptosis are dysregulated 
due to the acquisition of genetic changes in cancer cells. These genetic changes, combined with the 
interrelated physiologic adaptations of neo-angiogenesis, recruitment of stromal support tissues, and 
suppression of immune recognition, are measurable characteristics in tumor gene expression profiles and 
biochemical pathways. These measures can lead to identification of disease drivers and, ultimately, can 
be used to assign therapy. With advances in RNA sequencing technologies, the ability to simultaneously 
measure all genetic and gene expression changes with a single technology is now possible. The ability to 
create a comprehensive catalog of genotypic and phenotypic changes in a collection of histologically 
similar but otherwise distinct tumors should allow for a more precise positioning of existing targeted 
therapies and identification of new targets for intervention. 
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The incidence of cancer in China is high and 
increasing. The most prevalent cancer types worldwide 
are lung cancer, colorectal cancer, breast cancer, and 
prostate cancer. The prevalence of endothelial growth 
factor receptor (EGFR) mutations in the female 
non-smoking patients with lung cancer in Asia is higher 
from those in other continents'^'. However, the increasing 
incidence of gastric, esophageal, and hepatocellular 
cancers in China and other Asian countries are also 
distinct from those in western countries, and represent 
significant unmet medical needs '^i. In addition, small 
cohorts of Asian-specific cancers include oral cancer in 
India, bladder cancer in Taiwan, cholangiocarcinoma in 
Thailand, and nasopharyngeal carcinoma in southern 
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China and Southeast Asia. Although progress has been 
made worldwide in treating specific types of cancer, 
current cancer therapies have variable and generally low 
success rates, with the majority of patients, especially 
adults having carcinomas, receiving treatment that 
remains ineffective. 

On a population level, cancer is a heterogeneous 
disease, which has traditionally been defined by 
histological characteristics and the tissue site of the 
primary tumor. Collectively, approximately 200 subtypes 
of cancer exist based on the current definition of disease, 
and it is estimated that over 1000 additional subtypes 
remain to be defined. 

Cancer is a disease class in which patients benefit 
from a personalized medicine approach. Gleevec^' and 
Herceptin which are both antagonists of specific 
oncogenes, are multi-billion dollar drugs which were 
developed in subpopulations of patients with histologically 
and genetically defined cancers. Oncogenes are defined 
as genes that are genetically deranged, either by 
mutation, amplification, or chromosomal translocation. 
The mutant genes or changes in gene expression create 
a gain of function, typically resulting in either constitutive 
activation of a protein (the drug target) or acquisition of 
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neomorphic traits that drive tumorigenesis. Tfie first gene 
identified to hiave thiese cliaracteristics was thie myc 
oncogene, followed shortly by the identification of mutant 
K-Ras. Recognition of this basic concept of cancer 
biology is reflected in the Nobel Prize in Physiology or 
Medicine, which was awarded to Bishop and Varmus in 
1989'^'. Since then, the attempt to translate this concept 
into therapy has proceeded with prominence. This 
fundamental rule of cancer biology has been defined 
as "oncogene addiction"'^', and the differential efficacy 
of therapies on cancers with specific oncogene mutations 
has been referred to the "genetic therapeutic index" 

Using Knowledge of Oncogenes and 
Tumor Suppressor Genes for Cancer 
Drug Discovery 

To date, over 300 candidate oncogenes and tumor 
suppressor genes have been proposed across multiple 
cancer types. Identification of new oncogenes, tumor 
suppressor genes, and pathways has been greatly 
enabled by whole genome or exon DNA sequencing and 
mRNA profiling technologies. Although either approach is 
quite useful, neither alone is sufficient to fully identify the 
causal or driver genes in a single tumor. As cancer is a 
disease of genomic instability, hundreds of genetic 
changes are observed in each tumor, with the majority 
being passenger rather than causal in nature. By 
definition, passenger mutations are neither diagnostic for 
the disease state, nor do they represent meaningful 
targets for drug discovery. Therefore, simply measuring 
gene expression levels or cataloging DNA mutations in a 
tumor is insufficient to understand the etiology of 
disease. A recent report describing the whole genome 
sequencing analysis of 1 1 breast and 1 1 colorectal 
cancers identified an average of 80 DNA mutations that 
result in amino acid changes in a tumor'*''. Further 
analysis suggested that < 15 of these mutations 
contribute to disease etiology. The identity and 
prevalence of individual mutations that drive critical 
pathways is variable from tumor to tumor. Thus, the 
significance of these changes needs to be verified from 
the same sample with an alternative method of 
measurement that captures the phenotypic changes 
accompanying driver mutations. In an integrated 
analysis, whole genome mRNA profiling and targeted 
DNA analysis for over 200 glioblastoma samples 
successfully stratified each histologically defined 
glioblastoma subtype on the basis of association with 
known oncogene drivers'"'. Although this study was not 
designed to identify new oncogenes for glioblastoma, the 
association of PDGFRA, EGFR, and IDH1 mutations 
with specific subtypes suggests that inhibitors for these 
targets should be positioned within known subtypes to 
achieve greater clinical efficacy. 



Ironically, although the oncogenes myc and ras 
have clearly been identified as drivers of cancer and are 
therefore excellent theoretical targets for therapeutic 
intervention, a therapy that directly targets either of these 
proteins has not been delivered to patients, despite 
considerable efforts over the last 20 years in industry 
and academia. An alternative approach to inhibiting 
"undruggable" oncogene targets is to target not the 
oncogene itself, but the gain-of-function pathways that 
consist of downstream effectors for oncogene function. A 
universal output of pathway perturbation is altered gene 
expression, which can be measured on well-established 
platforms of microchip arrays Therefore, identifying 
pathway signatures that represent oncogene activation in 
a tumor setting may allow for identification of new targets 
in otherwise untractable pathways. In the case where a 
tumor is complex and driven by multiple oncogenic 
pathways, a whole genome approach will enable 
identification of all critical genes that are driving 
tumorigenesis. In this latter case, well defined 
gene-based pathway signatures can be used to assign 
both rationally designed mono and combination 
therapies'^^'^^'. 

Pathway Signatures Capture Genetic 
Output and Epigenetic Features of Tu- 
morigenesis 

Although pathway signatures are complementary 
measures to genetic changes, they are also valuable 
tools in their own right. Altered gene expression is a 
universal output of pathway perturbation. Multiple 
methods for deriving pathway signatures from 
simultaneous whole genome mRNA measurements have 
been developed over the last decade, with many 
pathway signatures published and validated as 
representative measures of variant biological processes. 
Perhaps the disease most impacted by this approach is 
breast cancer, where multiple pathway signatures have 
been used to redefine the three major subclasses of 
disease (as defined by FISH status of Her2 and IHC 
measurement of ER and PR) into four new categories: 
Her2, luminal A, luminal B, and basal'"'. The latter three 
categories are defined by pathway signatures, which are 
currently being used as experimental biomarkers in 
clinical studies. The hope that these signatures will be 
used in clinical practice is encouraged by the 2007 Food 
and Drug Administration approval of an assay for 
another pathway signature, the prognostic Mammaprint 
assay'^'*'. This landmark achievement is evidence that a 
multi-analyte measurement of gene expression can be 
executed with rigor and reproducibility to meet clinical 
regulatory requirements. 
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Using Signatures to Translate Know- 
ledge of Targets, Preclinical Functional 
Data, and Clinically Relevant Diseases 

Perhaps the greatest value of pathway signatures is 
the ability to link complex preclinical biology with clinical 
biology. Although simple cancers can be identified by 
single analyte biomarkers, either those that are already 
known or those that remain to be discovered, the more 
common and complex cases will require multi-analyte 
tools. The universal nature of microarray platforms has 
allowed for comprehensive measurement of genome 
expression in digital formats that enable identification of 
coordinated gene expression changes associated with 
different biological states and development of techniques 
to assign a numerical value to a pathway signature. This 
powerful approach allows comparison of biological 
samples from many different sources. The Connectivity 
Map, a post-array digital analysis tool, uses pathway 
signatures derived from microarray measurements to 
explain the mechanism of action for novel drug 
candidates, and multiple drug response signatures have 
been derived from comparing pre- and post- treatment 
samples in preclinical models. These signatures can 
then be tested from array profiles of tumor samples to 
identify patients with relevant biology'^^'. 

Another natural feature of measuring mRNA that is 
distinct from tumor gene mapping is that this approach 
measures the influence of oncogenes and tumor 
suppressor genes in specific context and also integrates 
signaling between the tumor and its microenvironment. 
Therefore, mRNA measurement has the potential to 
measure any biology that may override genetic 
determinants. For example, a pathway signature derived 
from overexpression of the K-Ras oncogene was used to 
identify tumors that have elevated pathway expression in 
the absence of the mutant oncogene, suggesting the 
presence of alternative and yet undiscovered drivers of 
disease Furthermore, as an example of measuring 
dominant signaling from the microenvironment, the 
mesenchymal subtype in the glioblastoma experiment 
mentioned earlier'^' was dominated by a pathway signature 
that had significant overlap with immune regulation and 
was distinct from one found in normal brain tissue, 
suggesting the presence of pro-inflammatory cells in this 
subtype. 

The limited value of preclinical models of cancer is 
evidenced by the high response rate demonstrated in 
preclinical studies by clinical candidate molecules, 
followed by a high failure rate in the clinic due to lack of 
efficacy'^ One way to improve these models is to better 
represent the heterogeneity of disease by moving a wide 
variety of tumor samples into preclinical platforms. 
Derivation of primary tumor samples into novel in vivo 



models is a well-described method of creating new 
preclinical models of cancer. Successful propagation of 
primary tumor samples as xenografts in immune- 
compromised mice varies from 10% to 30%, depending 
on the tumor type and technique used If the same 
tumor samples that are fully characterized in this 
proposed study can also be propagated in vivo or ex 
vivo, functional studies with novel candidate therapies on 
samples with full genomic and genetic characterization 
will enable discovery of selection biomarkers to enrich for 
responder populations for these novel therapies. 

Gastric Cancer: An Example of Unmet 
Medical Need 

An example of a cancer that is prevalent in China 
with unmet medical need is gastric or gastrointestinal 
cancer. The prevalence of this disease is far greater in 
Asia than in the West and is the second leading cause 
of cancer death in the world. Much work has been done 
to characterize this disease. Currently, gastric cancers 
are stratified into two histological groups, diffuse-type 
and intestinal-type adenocarcinoma, and both categories 
are associated with chronic inflammation by H. pylori, a 
common stomach infection™. Multiple studies with small 
sets of tumor samples reveal the changes in oncogenes 
that have already been identified from other cancer 
types, including K-Ras, BRAF and P/3K mutations, 
PTEN deletions, and MET and HER2 amplifications, 
which have all be observed in some clinical samples. 
These data, in combination with the identification of point 
mutations in the E-cadherin gene within a family with 
inherited predisposition for this disease'^^', suggest critical 
roles in growth factor signaling and the Wnt-p-catenin 
pathways. Interestingly, outer membrane proteins of H. 
pylori have been shown to interact with proteins from 
both of these pathways. For the majority of this disease, 
however, no oncogene or pathway driver is known. 
Treatment and response rates for gastric cancer in 
China are variable. Standard of care for treatment in the 
US includes the use of cytotoxic therapies (docetaxel, 
cisplatin, 5-fluorouracial) and radiation, where 5-year 
overall survival is less than 25% for all patients and less 
than 4% for those with advanced stage disease 
Therefore, the majority of patients who are diagnosed 
with mid to late stage gastric cancer carry a disease that 
has both an unknown etiology and a dismal prognosis. It 
can be argued, then, that a non-hypothesis driven 
characterization of gastric cancer samples can lead to 
the identification of the unknown drivers of the majority of 
this disease. The unusual role of H. pylori in this disease 
warrants an investigation of the host-microbe 
relationship. Genome-wide association studies to identify 
genomic vulnerabilities for this disease have been 
reported and are ongoing but data sets composed of 
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significant numbers of fully cliaracterized tumor samples 
from gastric cancer patients are not yet available. 



The Proposal 

Cancers in Chiinese populations represent an unmet 
medical need. Initial clinical evaluation of new thierapies 
in cancers tfiat are prevalent in tfie Western population is 
thie standard in phiarmaceutical development and delays 
demonstration of tfieir possible use in thie Asian 
population. We propose to accelerate the treatment of 
cancers in Cfiinese populations by creating a 
comprehiensive chiaracterization of thiese diseases. Thie 
combination of bothi genotypic and phienotypic profiling 
shiould enable 1 ) identification of all candidate oncogene 
drivers and tumor suppressor genes by measuring all 
genetic chianges in a single tumor; 2) identification of 
aberrantly activated patfiways thiat correspond to thiese 
genetic chianges by using phienotypic whiole genome 
profiling as well as biological measurements and 
statistical methods to identify causality; and 3) 
discrimination between driver mutations and passenger 
mutations to identify new targets and biomarl<ers for drug 
discovery. To date, this approach has not be applied to 
cancers in Chinese populations, as the deployment of 
both genotypic and phenotypic profiling technologies on 
a single patient sample is limited by patient sample size, 
the cost of deploying two expensive technologies on a 
single patient sample, and unavailability of systematic 
analytical methods and software that enables analysis of 
data from two different l<inds of data sets. More recently, 
the development of quantitative RNA sequencing, using 
second generation nucleic acid sequencing technology, 
offers the promise of improved dynamic range and 



fidelity of measurement for mRNA, and enables 
simultaneous detection of genetic changes and 
relevant mRNA gene expression levels'^"'. Thus, the new 
development of accurate and quantitative RNA 
sequencing technology is an attractive alternative for 
characterizing a biological sample using a single 
measurement, revealing both the candidate driver genes 
and activated pathways for this disease. 

Most cancer patients do not respond to therapy or 
eventually develop a cancer that is resistant to current 
therapies. It is therefore reasonable to assume that 
these patients have cancers that are complex and are 
unlil<ely to be defined by a single analyte biomarl<er. The 
application of genomic methods to reveal both DNA 
sequence and to calculate quantitative digital values for 
gene expression has the potential to improve our 
understanding of patient subpopulations with the most 
common and complex forms of these diseases. 

Regardless of the technologies chosen to 
characterize patient samples, it is clear that an integrated 
whole genome and genomic approach, combining both 
DNA and RNA measurements, is critical to discovering 
the root causes of these complex and fatal disorders. 
These datasets will naturally attract rational drug 
discovery efforts, and derivation of new drug targets and 
clinical biomarl<ers will enable rapid demonstration of 
efficacy in a molecularly-defined subpopulation of 
responders to existing and future therapies. Once 
completed, these datasets will reside in silica and in 
perpetuum, for scientists and oncologists to interpret and 
use to guide the rational implementation of novel 
treatments into effective standards of care. 
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